Web Scrapping

Web Scrapping also called as web data mining or web harvesting is the technique of constructing an agent which can extract, parse, download and organize useful information from the web automatically.

Web Crawling vs Web Scrapping

**web crawling** **Web Scrapping** 1) Refers to downloading and storing 1) Refers to extracting individual data elements from the the contents of a large number of website by using a specific site structure. website. 2) It's mostly done on a large scale 2) It can be implemented at any scale 3) It gives generalized information 3) It gives specific information. 4) It is typically used by search-engines 4) It is typically used by any size companies and mostly like Google,Yahoo,Microsoft used in the process of data analytics(data acquiring).

working of web scraper

Web scraper can be defined as a software(program) script used to download the contents of multiple pages and extracting data from it.

1) Visting the Website : The scrapper will visit the website

2) Downloading the contents : A web scrapper will download the requsted content from multiple pages

3) Extracting the Data : The data on the website is typically in HTML and mostly not in a structured format. Hence in this step web scrapper will parse and extract structured data from the downloaded content.

4) Storing the Data : Here a web scrapper will store and save the extracted data in any format (CSV, JSON, DB,PDF,HTML)

5) Analyzing the Data: After all the steps are successfully done, the scrapper will analyze the data

requests

pip install request

Beautiful Soup

pip install bs4

DocType: it contains information about the type of document

NavigableString : it represent text found in html document

Tag: it contains information and also contains nested tags

fetching data on the basis of classes